Identification of a ferroptosis-related gene pair biomarker with immune infiltration landscapes in ischemic stroke: a bioinformatics-based comprehensive study

Background Ischemic stroke (IS) is a principal contributor to long-term disability in adults. A new cell death mediated by iron is ferroptosis, characterized by lethal aggregation of lipid peroxidation. However, a paucity of ferroptosis-related biomarkers early identify IS until now. This study investigated potential ferroptosis-related gene pair biomarkers in IS and explored their roles in immune infiltration. Results In total, we identified 6 differentially expressed ferroptosis-related genes (DEFRGs) in the metadata cohort. Of these genes, 4 DEFRGs were incorporated into the competitive endogenous RNA (ceRNA) network, including 78 lncRNA-miRNA and 16 miRNA-mRNA interactions. Based on relative expression values of DEFRGs, we constructed gene pairs. An integrated scheme consisting of machine learning algorithms, ceRNA network, and gene pair was proposed to screen the key DEFRG biomarkers. The receiver operating characteristic (ROC) curve witnessed that the diagnostic performance of DEFRG pair CDKN1A/JUN was superior to that of single gene. Moreover, the CIBERSORT algorithm exhibited immune infiltration landscapes: plasma cells, resting NK cells, and resting mast cells infiltrated less in IS samples than controls. Spearman correlation analysis confirmed a significant correlation between plasma cells and CDKN1A/JUN (CDKN1A: r = − 0.503, P < 0.001, JUN: r = − 0.330, P = 0.025). Conclusions Our findings suggested that CDKN1A/JUN could be a robust and promising gene-pair diagnostic biomarker for IS, regulating ferroptosis during IS progression via C9orf106/C9orf139-miR-22-3p-CDKN1A and GAS5-miR-139-5p/miR-429-JUN axes. Meanwhile, plasma cells might exert a vital interplay in IS immune microenvironment, providing an innovative insight for IS therapeutic target. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08295-0.

Despite recombinant tissue plasminogen activator (rtPA) has been a mainstay to salvage the ischemic tissue, only a few IS victims can actually benefit from it due to the limitations of narrow therapeutic window, high economic expenditure, and hemorrhage-related complications [4]. Thus, uncovering the potential molecular mechanism and exploring the innovative therapeutic target for IS have been top priorities.
Ferroptosis is a newly recognized type of regulated cell death involved with the intracellular iron-mediated toxic accumulation of lipid peroxidation [5]. Unlike the mitochondrial morphology of other types of cell death, the shrunken mitochondria, increased membrane densities, reduced or disappeared mitochondria crista along with intact nucleus can be observed in ferroptosis under transmission electron microscopy [6][7][8]. The possible molecular mechanisms of ferroptosis involve abnormal iron metabolism, lipid peroxidation, and some critical enzymes (like GPX4) [9]. Although ferroptosis is first formally put forward in tumors [10], a growing body of work has demonstrated that it is also related to ischemic events, such as in intestinal, lung, and renal [11][12][13]. In neurology, ferroptosis has been confirmed to participate in intracerebral hemorrhage, subarachnoid hemorrhage, Alzheimer's disease, amyotrophic lateral sclerosis, and Parkinson's disease [14][15][16][17][18]. Preliminary evidence reveals that ferroptosis deteriorates ischemia-induced brain damage, while the administration of ferrostatin-1 (an inhibitor of ferroptosis) can effectively reverse induced damage [19,20]. Recently, increasing studies have focused on finding biomarkers of ferroptosis in multiple levels, including morphology, biochemistry, protein, and gene [21]. For example, Guozhong Chen et al. [22] reported that MAP1LC3B, PTGS2, and TLR4 could be potential ferroptosis-related biomarkers for IS via bioinformatics, and further explored potential therapeutic compounds, such as Zinc11679756 (Eltrombopag). However, exploring the reliable predictive gene pair biomarkers and specific regulatory details pertaining to ferroptosis in IS still be enormous challenges.
The ENCODE project changes the perception of noncoding RNAs from junks to essential regulators of cellular homeostasis and disruption [23,24]. In eukaryotes, noncoding RNAs exert their biological effects in the forms of small (transcripts < 200 nucleotides) RNAs (eg, microRNAs [miRNAs]) and long (transcripts > 200 nucleotides) RNAs (eg, long noncoding RNAs [lncR-NAs]) [25]. With the continuous improvement and maturity of RNA sequencing technology and bioinformatics method, a bulk of lncRNAs have been demonstrated to cross-regulate the stability of mRNA at the post-transcriptional level by serving as competing endogenous RNAs (ceRNAs) for shared miRNAs [26]. It is reported that ischemic nerve cells appear more sensitive to the aberrant expression of noncoding RNAs, which affect apoptosis, inflammation, proliferation, autophagy, and angiogenesis [27]. Therefore, further in-depth understanding of these genes may provide a novel perspective for identifiable biomarkers and therapeutic frontier in IS.
As we know, multiple immune cells infiltrate into the ischemic parenchyma from peripheral circulation via broken BBB after IS, triggering innate and adaptive immune responses [28]. Actually, the exact functional roles of activated and infiltrated immune cells depend on the ischemic microenvironment at different phases of IS. For example, neutrophils, dendritic cells, and monocytes appear in the stroked brain, exacerbating neuroinflammatory response by releasing complements, cytokines, cytolysis, as well as interacting with other cells at an early stage after IS. T and B cells infiltrate into the injured brain at chronic stages of IS, facilitating neuron repair and prompting functional recovery [29]. Up to now, few studies have used the CIBERSORT tool to analyze immune infiltration in IS. Hence, evaluating the landscapes of immune infiltration during IS process is of vital significance for advanced targeted therapeutics.
In this study, we firstly identified the differently expressed ferroptosis-related genes (DEFRGs) between IS and control samples from the metadata cohort and ferroptosis-related dataset. According to four independent databases, a ceRNA network was constructed. Then, we used relative expression values of DEFRGs to establish gene pairs. After integrated analysis among least absolute shrinkage and selection operator (LASSO) regression, support vector machine (SVM), ceRNA, and gene pair, we incorporated the key gene-pair biomarker to plot the receiver operating characteristic (ROC) curve and compared their diagnostic capability for IS between gene pair and single gene. Importantly, we further explored the potential regulatory mechanisms of this new biomarker from the perspectives of ceRNA and immune infiltration in IS.

Dataset acquisition and data preprocessing
We searched the Gene Expression Omnibus (GEO) database (https:// www. ncbi. nlm. nih. gov/ geo) on 10 September 2020 using the following retrieval conditions: "ischemic stroke" AND "Homo sapiens" AND "gse" AND "Expression profiling by array". We included the gene expression profiling of whole blood or peripheral blood of IS patients or control samples. Profiles with incomplete data, related to cell lines, and associated with other diseases were excluded. Then, two mRNA-sequence datasets (GSE22255 and GSE16561) of 103 patients were retrieved and collected for analysis. Besides, we also downloaded the GSE140275 profile for construction of ceRNA network, which included lncRNA and mRNA expression values of IS. GSE140275 consisted of 3 control and 3 IS samples [30], while GSE22255 was made up of 20 control and 20 IS samples [31]. GSE140275 and GSE22255 datasets served as a discovery cohort. GSE16561 included 24 control and 39 IS samples [32][33][34], serving as a validation cohort. More detailed information about the three datasets was shown in Table 1. Additionally, a 259 ferroptosis-related genes dataset was fetched from FerrDb [35], including 108 drivers, 69 suppressors, and 111 markers (29 were overlapped genes among them) (More details were shown in Supplemental Table 1). The raw data were preprocessed by the following means: 1) merging GSE140275 and GSE22255 into a metadata cohort to enlarge sample size; 2) carrying out batch normalization to offset the deviations between two datasets using R's "sva (v3.34.0)" package. Because all datasets were publicly accessible from the GEO database or FerrDb database, and the ethics committee approval of the Second Affiliated Hospital of Xi'an Jiaotong University was not required to conduct the current study. Thus, all data were freely available. The workflow and data preprocessing were illustrated in Fig. 1.

Establishment of gene pair
On the basis of DEFRGs, we established differentially expressed ferroptosis-related gene pairs (DEFRGPs). This method could overcome the technical noise and heterogeneity between different datasets, which has been proven to be effective and reliable [42]. Briefly, each gene pair was calculated by pairwise comparison of the expression value for a given sample. Once the expression value of DEFRG-1 was higher than that of the DEFRG-2 in a specific DEFRGP, the output was defined as 1; otherwise, the output was defined as 0. DEFRG-1 and DEFRG-2 represented any two different DEFRGs. Therefore, 0's and 1's formed a ferroptosis-related gene pair. Only the mutual

Screening of key DEFRGP biomarker
As a state-of-the-art machine learning algorithm for binary classification, SVM classifies data points by finding a decision boundary to predict labels based on one or more variable vectors [43]. This decision boundary, also called the hyperplane, keeps the margin between classes as far apart as possible [44]. In this study, we addressed the diagnosis prediction of IS as a classification problem (i.e., whether a sample was identified as IS or control). To improve the accuracy of predicting IS outcome, the LASSO regression algorithm was also used to reduce genes dimensionality via seeking for the optimal penalty parameter-λ, which was determined by minimal binomial deviance. To avoid our data suffering from overfitting and find more stable SVM and LASSO regression models, we also performed five-fold cross-validation in these two processes. There were three steps: step 1, divided the data into five equal piles; step 2, selected one pile as testing and the other four piles as training to fit the model; step 3, repeated step 2 five folds in total with different testing selected each time until testing was performed on all five piles. All DEFRGs were subjected to SVM and LASSO regression using R's "caret (v6.0-88)", "glmnet (v4.1-2)" packages, respectively. The random seed was set to 3 in all SVM progress, and 214 in all LASSO regression progress. Besides, a venn diagram visualized the key DEFRGP biomarker from the results of LASSO regression, SVM, ceRNA, and gene pair.

Diagnostic performance of key DEFRGP biomarker in IS
We accessed the diagnostic performance of the key DEFRGP biomarker in the discovery dataset (GSE140275 and GSE22255) using GraphPad Prism software (v8.0.1, GraphPad, Inc., La Jolla, CA, USA). The expression values of gene pair joint diagnosis were obtained via logistic regression. Herein, we compared the discriminative power of single gene and gene pair in terms of the area under the ROC curve (AUCROC), 95% confidence interval (CI), specificity along with sensitivity. Then, we verified the diagnostic performance of key biomarker to distinguish patients with IS from controls in the external validation cohort (GSE16561).

Immune infiltration analyses
Meanwhile, we quantified the relative percentages of immune cells in each sample using mRNA expression values by CIBERSORT, a classic deconvolution approach based on linear support vector regression [45]. Herein, we performed "CIBERSORT (http:// ciber sort. stanf ord. edu, accessed on 03 February 2016)" and "parallel", "e1071 (v1.7-8)", "preprocessCore (v1. 48.0)" packages in R to analyze. The relative percentages of 22 immune cell subpopulations in each individual were visualized by a bar plot. R's "corrplot (v0.90)" package was applied to visualize the association of all cell subpopulations in the form of a correlation heatmap, while the "ggplot2 (v3.3.0)" package was utilized to reflect the infiltrating difference between IS and control samples via violin diagram. P < 0.05 was accepted as a cut-off value.

Statistical analyses
All statistical analyses and graphical work were processed by R software (version 3.6.2, Vienna, Austria). Venn diagrams were conducted using an online tool (http:// bioin forma tics. psb. ugent. be/ webto ols/ Venn/). The ROC analysis was visualized using GraphPad Prism software (v8.0.1, GraphPad, Inc., La Jolla, CA, USA). Continuous variables were expressed as mean ± SD, and differences between two groups were compared using Student's t-test for normally distributed variables and Mann-Whitney U test for abnormally distributed variables. Differential expression analysis was performed with the cut-off thresholds of P < 0.05 and |log 2 FC| > 0.58 or |FC| > 1.5, which was consistent with previously reported methods [36,37]. For each study, P < 0.05 was considered as a significant difference.

Functional enrichment analysis of 6 DEFRGs
KEGG enrichment uncovered the above 6 DEFRGs principally participated in renal cell carcinoma, colorectal cancer, rheumatoid arthritis, breast cancer, IL-17 signaling pathway, endocrine resistance, TNF signaling pathway, oxytocin signaling pathway, mTOR signaling pathway, hepatitis B, NOD-like receptor signaling pathway ( Fig. 3a and Supplemental Table 2). BP was mainly enriched in the response to glucocorticoid, response to starvation, intrinsic apoptotic signaling pathway in response to DNA damage by p53 class mediator, response to lipopolysaccharide, positive regulation of fibroblast proliferation, and response to steroid hormone. CC showed that 6 DEFRGs were related to nuclear euchromatin, euchromatin, cyclin-dependent protein kinase holoenzyme complex, transcriptional repressor complex, and serine/threonine protein kinase complex. In MF, 6 DEFRGs were mainly associated with ubiquitin protein ligase binding, cAMP response element binding, HMG box domain binding, R-SMAD binding, cyclin binding, and neutral amino acid transmembrane transporter activity (Fig. 3c and Supplemental Table 3).

Construction of ceRNA network
The miRcode database prediction displayed 19 DElncR-NAs had binding sites with 202 miRNAs among identified 482 DElncRNAs. And the combined prediction of miRDB, miRTarBase, and TargetScan databases demonstrated that 54 obtained miRNAs could bind to 8605 target mRNAs among the aforesaid 202 miRNAs (Figs. 1  and 3b). We merged these 8605 predictive mRNAs with

Screening for key DEFRGP biomarker
In the metadata cohort, we totally established 2 meaningful DEFRGPs: CDKN1A/JUN, JUN/SLC7A5. SVM created a hyperplane for 2 DEFRGs (CDKN1A, JUN) at the seed of 3 (Fig. 4a). While LASSO regression obtained the minimum binomial deviance at the seed of 214 (Fig. 4b, c), keeping 6 DEFRGs perfectly. The venn diagram displayed that CDKN1A/JUN was screened as the key  (Fig. 5a).

Diagnostic performance of CDKN1A/JUN in IS
The diagnostic performances of the key gene pair (CDKN1A/JUN) and genes (CDKN1A and JUN) to distinguish patients with IS and controls were appraised via ROC analysis in the discovery and validation cohorts.

Immune infiltration landscapes
To understand the roles of CDKN1A/JUN in the brain microenvironment during IS process, we investigated immune cells landscapes and their relationship with CDKN1A/JUN. The bar plot clearly showed that the contents of varied subpopulations in each individual (Fig. 6a). Correlation heatmap between 22 immune cell subpopulations in IS revealed that M1 macrophages were positively correlated with resting dendritic cells. Regulatory T cells displayed distinct associations with M0 macrophages, memory B cells, and neutrophils. M0 macrophages were positively correlated with memory B cells and neutrophils, and the latter two also showed a significantly positive correlation (Fig. 6b).
The violin diagram showed that compared with control samples, plasma cells, resting NK cells, and resting mast cells were all presented with lower infiltrates in IS samples (Fig. 6c). Intriguingly, we found a substantial relationship between CDKN1A/JUN and plasma cells among the three infiltrating cell subpopulations. Specifically, CDKN1A had a negative correlation with plasma cells (r = − 0.503, P < 0.001) ( Fig. 7a and Supplemental Table 4), while JUN was not only negatively correlated with plasma cells (r = − 0.330, P = 0.025) but also with resting NK cells (r = − 0.318, P = 0.031) (Fig. 7b and Supplemental Table 5). However, CDKN1A had no correlation with resting NK cells (Fig. 7a). Besides, the resting mast cell was not associated with either CDKN1A or JUN. These results suggested that CDKN1A/JUN could partly reflect the condition of the brain microenvironment in IS.

Exploration of potential regulatory axes for CDKN1A/JUN
After screening out the robust CDKN1A/JUN biomarker to identifying patients with IS, we further explored their regulatory axes according to the ceRNA network. Given that a miRNA could bind to multiple lncRNAs and 14 lncRNAs were too many to accurately mine the potential  The dot with a larger size has a stronger correlation coefficient. The p value is presented by different color, the dot with a more purple color has a smaller p value, while the dot with a yellower color has a larger p value Fig. 8 Refining for core lncRNA using SVM and ceRNA regulatory axes for CDKN1A/JUN. a Refining for core lncRNA by SVM. b-c The ceRNA regulatory axes for CDKN1A/JUN biomarker. b Sankey plot displays the interacted miRNAs for CDKN1A and JUN. c Sankey plot displays the interacted lncRNAs for the above miRNA ceRNA axes for CDKN1A/JUN biomarker. Accordingly, SVM was utilized again to refine the upstream regulated lncRNAs, refining 3 core lncRNAs (C9orf106, C9orf139, GAS5) at the seed of 9999 (Fig. 8a). In terms of CDKN1A, only miR-22-3p had a binding site for it, while C9orf106 and C9orf139 could co-regulate miR-22-3p; for JUN, miR-139-5p and miR-429 could co-bind to it, while both of them were regulated by GAS5 (Fig. 8b, c).

Discussion
Despite the great improvement that has been made in IS treatment over decades, the thrombolytic therapy is nevertheless not satisfactory, and a number of patients still suffer from long-term disabilities. Along with the development of sequencing technology, an expectation increasingly grows that exploring the biological effects of noncoding RNAs in the non-oncology field, especially in stroke. Ferroptosis has been confirmed to be involved in the occurrence and development of IS. However, very few studies have focused on ferroptosis-related genes and potential regulatory details as well as the immune infiltration landscapes, which has profound significance for IS patients. In the current study, totally 6 DEFRGs were identified in the metadata cohort. The integration scheme among LASSO regression, SVM, ceRNA, and gene pair revealed that CDKN1A/JUN was the key ferroptosis-related gene pair. In addition, CIBERSORT was applied to analyze immune infiltration in IS.
Compared to prior studies [46], a merge of datasets and integration scheme were two dominant advantages in this study. On the one hand, merging datasets allowed for a larger sample size to incorporate more DEFRGs, which was conducive to subsequent machine learning analysis. On the other hand, the ROC analysis results showed that the superior discriminative effectiveness of CDKN1A/JUN gene pair than single gene (CDKN1A and JUN) both in the discovery and validation cohorts, implying that the integration scheme was feasible and reliable. Cyclin-dependent kinase inhibitor 1A (CDKN1A), also known as p21 WAF1/Cip1 , interacts with cyclin-dependent kinases to exert its activity [47]. Accumulating evidence implied that up-regulated expression of CDKN1A led to the block of cell cycle at the G1 phase in a p53-dependent or p53-independent manner, and then induced apoptosis [48][49][50]. Apart from apoptosis, CDKN1A was also reported to induce autophagy via ursolic acid [51]. One previous clinical research indicated that CDKN1A participated in the proliferation of mesenchymal stem cells in humans with IS serum [52]. Besides, CDKN1A could impact peroxide metabolism, such as glutathione, making it an ideal candidate for detecting ferroptosis [53]. Considering that CDKN1A always involved in cell death, it was reasonable to speculate that it played a pivotal role in ferroptosis via regulating the cell cycle process in IS. The c-JUN protein encoded by JUN, acting as a transcription factor, dimerizes with Maf/Nrl families or Fos families to regulate gene transcription [54]. In mammals, c-JUN takes part in diverse cell activities and pathophysiologic processes, including proliferation, differentiation, senescence, apoptosis, neuronal development, inflammations, tumorigenesis as well as cellular transformation [55,56]. Li Y et al. confirmed that the expression of c-JUN was up-regulated, which aggravated cerebral ischemia/reperfusion (I/R) induced injury [57]. As the downstream effector of the JNK pathway, c-JUN was found to regulate inflammation and cell death in the ischemic brain. Moreover, c-JUN was also implicated to regulate cell pyroptosis and activate the NLRP3 inflammasome [58]. Thus, we reckoned that JUN played a momentous part in the progression of IS. Known evidence from prior researches together with our findings suggested that the functions and effects of CDKN1A and JUN in IS should be the center of investigations in the near future.
A plethora of data has indicated the participation of ferroptosis in immunity [5,59]. Ferroptotic cells can activate innate immunity and release pro-inflammatory factors in various diseases (including myocardial I/R injury, and glioma), recruiting lots of immune cells [60]. When IS occurs, the breakdown of BBB allows immune cells to flood into the central nervous system. For instance, Meng H et al. found that double-negative T cells were gradually increased in a time-dependent manner, amplifying proinflammatory microglia and prompting brain injury in IS patients or MCAO mice [61]. Our findings suggested that plasma cells, resting NK cells, and resting mast cells infiltrated less in IS samples compared to controls. Remarkably, only plasma cells were linked to CDKN1A/JUN biomarker simultaneously. Kong Y et al. [62] reported that the number of NK cells reduced in IS patients, keeping in accordance with our findings. Mast cells played a detrimental role in IS by accelerating BBB disruption and magnifying neuroinflammation via releasing cytokines [63]. Another in vivo model confirmed that the inflammation in infarcts, especially mediated by increased B cells or plasma cells, contributed to post-stroke cognitive impairment by secreting antibodies or complements [64], which was contrary to our findings. One plausible explanation was that our gene expression data came from the serum, not the brain tissue, plasma cells would migrate from circulation to ischemic brain tissue to protect neurons from inflammatory damage after IS. Further research is warranted to address this contradictory issue.
The potential regulatory mechanisms through which CDKN1A/JUN carried out are also noteworthy. Based on the ceRNA network and screen of core lncRNAs via SVM, our results showed potential regulatory axes for CDKN1A and JUN. Both C9orf106 and C9orf139 could act as sponges of miR-22-3P, competing bind to up-regulated CDKN1A. Through promoting the polarization of macrophages, miR-22-3p inhibited inflammation and lessened the spinal cord I/R injury [65]. Nevertheless, there were few works about C9orf106 and C9orf139. Both miR-139-5p and miR-429 had binding sites for JUN, while GAS5 abolished this effect via sponging miR-139-5p or miR-429. Several lines of evidence showed that GAS5 was extensively related to neuronal apoptosis and differentiation [66]. And a previous study implicated that the miR-139-5p/c-JUN-initiated pathway regulated the function of diabetic endothelial cells, providing an experimental basis for our findings [67]. Another evidence revealed that miR-429 inhibited hepatocyte proliferation via negatively regulating c-JUN [68]. Unfortunately, the expression profile for miRNAs was rare in the GEO database. Hence, it is undeniable that more work is needed to explore and confirm our findings.
According to KEGG enrichment analysis, these 6 DEFRGs principally participated in TNF signaling pathway, mTOR signaling pathway, NOD-like receptor signaling pathway. Existing evidence lent support to the idea that these pathways affected the initiation and progression of IS. For example, TNF and IL-1β accelerated inflammatory lesions in IS via NOD-like receptor signaling pathway [69]. GO enrichment analysis revealed that these genes mainly enriched in intrinsic apoptotic signaling pathway in response to DNA damage by p53 class mediator in BP, correlating with cyclin-dependent protein kinase holoenzyme complex. Previous research supported our findings, indicating that DNA damagesignaling pathway responses aggravated brain I/R injury and this process could be attenuated by chloroquine [70]. Cyclin-dependent protein kinase holoenzyme complex was a crucial regulator in the cell cycle involved in many processes, including apoptosis, senescence, and autophagy. Additionally, they were mainly associated with cAMP response element binding in MF, whose phosphorylation was indispensable for the decrease in oxygen-glucose deprivation and reoxygenation-induced apoptosis of astrocytes [71].
Despite we merged two different datasets to enlarge the sample size, there were several limitations that should be acknowledged. First, it was a retrospective analysis, bringing about an inherent bias. Second, the profiles of our datasets were from the blood samples instead of the brain tissue, their reliability should be verified henceforth. Third, the CDKN1A/JUN biomarker was constructed on RNA sequences, their reproducibility and wide applicability need to be validated using experimental or clinical samples.

Conclusions
Taken together, our findings indicated that the CDKN1A/ JUN was a robust and promising diagnostic biomarker for identifying the patients with IS, which might regulate ferroptosis during the progression of IS via C9orf106/ C9orf139-miR-22-3p-CDKN1A and GAS5-miR-139-5p/ miR-429-JUN axes. Meanwhile, plasma cells might exert a vital interplay in the new gene pair biomarker's diagnostic ability, providing an innovative insight for IS therapeutic target. Further biological experiments in combination with larger prospective clinical samples are warranted to verify the functions and effects of this new gene pair biomarker in IS.